Conversation
Signed-off-by: Vincent Caux-Brisebois <vcauxbrisebo@nvidia.com>
…a unbind hardening Signed-off-by: Vincent Caux-Brisebois <vcauxbrisebo@nvidia.com>
6ab54d1 to
199a712
Compare
| vm: | ||
| name: VM Checks | ||
| runs-on: build-amd64 |
There was a problem hiding this comment.
maybe for now we can leave these out. we'll add tests once we get things over to the driver architecture.
There was a problem hiding this comment.
same as above. i think we stash this. once we land the initial implementation lets add the vm to our e2e tests.
There was a problem hiding this comment.
nit: we dont want to check this in. i capture the plan for posterity under OS-53.
| # ./build-rootfs.sh [--base] [--arch aarch64|x86_64] [output_dir] | ||
| # ./build-rootfs.sh [--base] [--gpu] [--arch aarch64|x86_64] [output_dir] |
There was a problem hiding this comment.
Will we need to publish two rootfs'?
There was a problem hiding this comment.
plan here is to ship a single kernel for both gpu and cpu? (i think this makes sense for now, just checking)
There was a problem hiding this comment.
we want to avoid adding the whole vm dep here if possible. lets talk about this more since i don't know if it's straightforward to remove or not.
|
Also ran this through codex review, its feedback looks pretty good: The patch introduces two functional blockers in the new GPU path: the gateway CLI drops the VFIO bind guard as soon as deployment returns, and openshell-vm --gpu still boots the non-GPU rootfs by default. It also leaves host IPv4 Full review comments:
|
Summary
Adds VFIO GPU passthrough support to openshell-vm using cloud-hypervisor as a second VMM backend alongside libkrun. Includes a full GPU bind/unbind lifecycle with safety checks, nvidia driver deadlock hardening (subprocess isolation with timeout, pre-unbind module cleanup, post-timeout verification), and an RAII guard that restores the original driver on exit.
Related Issue
N/A
Changes
Testing
mise run pre-commitpassesChecklist